map score
FEDEXCHANGE: Bridging the Domain Gap in Federated Object Detection for Free
Yuan, Haolin, Li, Jingtao, Zhuang, Weiming, Chen, Chen, Lyu, Lingjuan
Federated Object Detection (FOD) enables clients to collaboratively train a global object detection model without accessing their local data from diverse domains. However, significant variations in environment, weather, and other domain specific factors hinder performance, making cross domain generalization a key challenge. Existing FOD methods often overlook the hardware constraints of edge devices and introduce local training regularizations that incur high computational costs, limiting real-world applicability. In this paper, we propose FEDEXCHANGE, a novel FOD framework that bridges domain gaps without introducing additional local computational overhead. FEDEXCHANGE employs a server side dynamic model exchange strategy that enables each client to gain insights from other clients' domain data without direct data sharing. Specifically, FEDEXCHANGE allows the server to alternate between model aggregation and model exchange. During aggregation rounds, the server aggregates all local models as usual. In exchange rounds, FEDEXCHANGE clusters and exchanges local models based on distance measures, allowing local models to learn from a variety of domains. As all operations are performed on the server side, clients can achieve improved cross domain utility without any additional computational overhead. Extensive evaluations demonstrate that FEDEXCHANGE enhances FOD performance, achieving 1.6X better mean average precision in challenging domains, such as rainy conditions, while requiring only 0.8X the computational resources compared to baseline methods.
Table 1 Comparison in terms of MAP scores of two
Thank all the reviewers for their valuable comments. We have fixed all the mistakes and made responses to all questions. Given your constructive suggestions, we have confidence on improving our work further. Therefore, these two types of adversarial samples are different. Please refer to our response to Reviewer 2. Other cross-modal tasks are out of the scope for this paper but can Hamming distance between the generated hash codes, thus resulting in effective perturbations.
Retrieving Time-Series Differences Using Natural Language Queries
Dohi, Kota, Nishida, Tomoya, Purohit, Harsh, Endo, Takashi, Kawaguchi, Yohei
--Effectively searching time-series data is essential for system analysis; however, traditional methods often require domain expertise to define search criteria. Recent advancements have enabled natural language-based search, but these methods struggle to handle differences between time-series data. T o address this limitation, we propose a natural language query-based approach for retrieving pairs of time-series data based on differences specified in the query. Specifically, we define six key characteristics of differences, construct a corresponding dataset, and develop a contrastive learning-based model to align differences between time-series data with query texts. Experimental results demonstrate that our model achieves an overall mAP score of 0.994 in retrieving time-series pairs. The state of any system can be represented as time-series data consisting of one or multiple channels.
Communication-Efficient Federated Learning Based on Explanation-Guided Pruning for Remote Sensing Image Classification
Klotz, Jonas, Büyüktaş, Barış, Demir, Begüm
Federated learning (FL) is a decentralized machine learning paradigm, where multiple clients collaboratively train a global model by exchanging only model updates with the central server without sharing the local data of clients. Due to the large volume of model updates required to be transmitted between clients and the central server, most FL systems are associated with high transfer costs (i.e., communication overhead). This issue is more critical for operational applications in remote sensing (RS), especially when large-scale RS data is processed and analyzed through FL systems with restricted communication bandwidth. To address this issue, we introduce an explanation-guided pruning strategy for communication-efficient FL in the context of RS image classification. Our pruning strategy is defined based on the layerwise relevance propagation (LRP) driven explanations to: 1) efficiently and effectively identify the most relevant and informative model parameters (to be exchanged between clients and the central server); and 2) eliminate the non-informative ones to minimize the volume of model updates. The experimental results on the BigEarthNet-S2 dataset demonstrate that our strategy effectively reduces the number of shared model updates, while increasing the generalization ability of the global model. The code of this work will be publicly available at https://git.tu-berlin.de/rsim/FL-LRP
Using Pretrained Large Language Model with Prompt Engineering to Answer Biomedical Questions
Our team participated in the BioASQ 2024 Task12b and Synergy tasks to build a system that can answer biomedical questions by retrieving relevant articles and snippets from the PubMed database and generating exact and ideal answers. We propose a two-level information retrieval and question-answering system based on pre-trained large language models (LLM), focused on LLM prompt engineering and response post-processing. We construct prompts with in-context few-shot examples and utilize post-processing techniques like resampling and malformed response detection. We compare the performance of various pre-trained LLM models on this challenge, including Mixtral, OpenAI GPT and Llama2. Our best-performing system achieved 0.14 MAP score on document retrieval, 0.05 MAP score on snippet retrieval, 0.96 F1 score for yes/no questions, 0.38 MRR score for factoid questions and 0.50 F1 score for list questions in Task 12b.
Vessel Re-identification and Activity Detection in Thermal Domain for Maritime Surveillance
Ginige, Yasod, Gunasekara, Ransika, Hewavitharana, Darsha, Ariyarathne, Manjula, Rodrigo, Ranga, Jayasekara, Peshala
Maritime surveillance is vital to mitigate illegal activities such as drug smuggling, illegal fishing, and human trafficking. Vision-based maritime surveillance is challenging mainly due to visibility issues at night, which results in failures in re-identifying vessels and detecting suspicious activities. In this paper, we introduce a thermal, vision-based approach for maritime surveillance with object tracking, vessel re-identification, and suspicious activity detection capabilities. For vessel re-identification, we propose a novel viewpoint-independent algorithm which compares features of the sides of the vessel separately (separate side-spaces) leveraging shape information in the absence of color features. We propose techniques to adapt tracking and activity detection algorithms for the thermal domain and train them using a thermal dataset we created. This dataset will be the first publicly available benchmark dataset for thermal maritime surveillance. Our system is capable of re-identifying vessels with an 81.8% Top1 score and identifying suspicious activities with a 72.4\% frame mAP score; a new benchmark for each task in the thermal domain.
DALLMi: Domain Adaption for LLM-based Multi-label Classifier
Beţianu, Miruna, Mălan, Abele, Aldinucci, Marco, Birke, Robert, Chen, Lydia
Large language models (LLMs) increasingly serve as the backbone for classifying text associated with distinct domains and simultaneously several labels (classes). When encountering domain shifts, e.g., classifier of movie reviews from IMDb to Rotten Tomatoes, adapting such an LLM-based multi-label classifier is challenging due to incomplete label sets at the target domain and daunting training overhead. The existing domain adaptation methods address either image multi-label classifiers or text binary classifiers. In this paper, we design DALLMi, Domain Adaptation Large Language Model interpolator, a first-of-its-kind semi-supervised domain adaptation method for text data models based on LLMs, specifically BERT. The core of DALLMi is the novel variation loss and MixUp regularization, which jointly leverage the limited positively labeled and large quantity of unlabeled text and, importantly, their interpolation from the BERT word embeddings. DALLMi also introduces a label-balanced sampling strategy to overcome the imbalance between labeled and unlabeled data. We evaluate DALLMi against the partial-supervised and unsupervised approach on three datasets under different scenarios of label availability for the target domain. Our results show that DALLMi achieves higher mAP than unsupervised and partially-supervised approaches by 19.9% and 52.2%, respectively.
Early and Accurate Detection of Tomato Leaf Diseases Using TomFormer
Khan, Asim, Nawaz, Umair, Kshetrimayum, Lochan, Seneviratne, Lakmal, Hussain, Irfan
Tomato leaf diseases pose a significant challenge for tomato farmers, resulting in substantial reductions in crop productivity. The timely and precise identification of tomato leaf diseases is crucial for successfully implementing disease management strategies. This paper introduces a transformer-based model called TomFormer for the purpose of tomato leaf disease detection. The paper's primary contributions include the following: Firstly, we present a novel approach for detecting tomato leaf diseases by employing a fusion model that combines a visual transformer and a convolutional neural network. Secondly, we aim to apply our proposed methodology to the Hello Stretch robot to achieve real-time diagnosis of tomato leaf diseases. Thirdly, we assessed our method by comparing it to models like YOLOS, DETR, ViT, and Swin, demonstrating its ability to achieve state-of-the-art outcomes. For the purpose of the experiment, we used three datasets of tomato leaf diseases, namely KUTomaDATA, PlantDoc, and PlanVillage, where KUTomaDATA is being collected from a greenhouse in Abu Dhabi, UAE. Finally, we present a comprehensive analysis of the performance of our model and thoroughly discuss the limitations inherent in our approach. TomFormer performed well on the KUTomaDATA, PlantDoc, and PlantVillage datasets, with mean average accuracy (mAP) scores of 87%, 81%, and 83%, respectively. The comparative results in terms of mAP demonstrate that our method exhibits robustness, accuracy, efficiency, and scalability. Furthermore, it can be readily adapted to new datasets. We are confident that our work holds the potential to significantly influence the tomato industry by effectively mitigating crop losses and enhancing crop yields.